perm filename ARTNAT.ART[ESS,JMC] blob
sn#005543 filedate 1971-12-28 generic text, type T, neo UTF8
00100 ARTIFICIAL INTELLIGENCE AND NATURAL INTELLIGENCE
00200
00300 by John McCarthy, Professor of Computer Science, Stanford University
00400
00500
00600 In this paper, I shall describe some of the work done under
00700 the flag of artificial intelligence (AI) and discuss its relevance to
00800 questions of natural intelligence.
00900
01000 Interaction between AI and the study of natural intelligence
01100 is greatly affected by the primitive state of knowledge in both
01200 fields. There are many tasks performed by human intelligence that we
01300 have no clear idea of a mechanism for performing. Thus we know no
01400 mechanism that will locate the people in the visual field and give
01500 the present positions of their limbs. We know no mechanism that will
01600 generate from a photograph of a person a generalized idea of his
01700 appearance that will enable him to be recognized in the future. We
01800 also don't know any way of representing a person's general knowledge
01900 of airplanes, travel agents, hotels, etc. that will enable the
02000 planning of a trip. Moreover, although we can mention a very large
02100 number of specific tasks that give us difficulty in determining a
02200 mechanism, we can't yet divide the process of intelligent behavior
02300 into a comprehensive set of subprocesses that can be studied
02400 separately.
02500
02600 Therefore, if AI research turns up a mechanism for doing one
02700 of these tasks, the psychologist will be aided in his search for the
02800 mechanism used by humans or animals. Conversely, an AI researcher who
02900 is stumped in programming a task may benefit from any information
03000 obtainable about how humans do it. The psychologist who has a theory
03100 of how a task is performed will benefit from trying to program it.
03200 Experience indicates that unless this is done the theory is likely to
03300 remain so vague that it is uncertain whether a mechanism has really
03400 been specified. The AI researcher can often tell immediately that a
03500 proposed mechanism wouldn't work or even that no mechanism has really
03600 been specified.
03700
03800 The most extensive work in correlating human performance with
03900 that of computer programs has been done at Carnegie-Mellon University
04000 by Allen Newell, Herbert Simon and their collaborators. I cannot
04100 here summarize this work or the conclusions they have reached.
04200
04300 Outsiders often ask what formal definition of intelligence is
04400 used in AI research. Unfortunately, we don't have one; AI has had
04500 to be defined as making computers solve problems that people are
04600 called intelligent for solving. In the short run this is good
04700 enough, because there are plenty of problems that clearly require
04800 intelligence for human solution that we are having great difficulty
04900 getting machines to solve. In the long run, the adjective
05000 "intelligent" applied to behavior or systems may become a technical
05100 term in a theory of intelligence, but no-one in AI has yet attempted
05200 to formulate a general theory of intelligence. At least there is
05300 no such theory taken seriously by many workers in the field.
05400
05500 Serious work in AI didn't start until just after the first
05600 stored program computers. Much of it then went into an area I call
05700 heuristics which is concerned with searching spaces of possibilities
05800 for solutions to problems. The first problems considered included
05900 games - chess, checkers, etc., and mathematics - theorem proving by
06000 manipulation of formulas in such domains as Euclidean geometry and
06100 symbolic logic.
06200
06300 What was learned from this?
06400
06500 Not as much as one might hope but something. We learned
06600 less than we should have, because the writers of heuristic programs
06700 haven't been very analytical about it. Mostly they belong to the
06800 "look ma, no hands" school of artificial intelligence. They write a
06900 program, try it out, and if they have to write a paper, they tell us
07000 how the program works and how well it did. They don't tell us much
07100 of general interest about artificial intelligence learned from the
07200 particular task. However, there is much more to be learned from the
07300 programs than the authors have so far made explicit.
07400
07500 First of all, we learned what was easy and what was
07600 difficult. Thus a game called kalah is quite easy (we got a computer
07700 program (Russell 1965) better than any human player); checkers is
07800 harder, but Samuel's program (Samuel 196x) got some draws with the
07900 U.S. champion till he learned its ways; chess is harder still (the
08000 best program so far (Greenblatt 1967) plays class C chess); and the
08100 Japanese game of go is so hard that the best program (Ryder 1971)
08200 can't beat a novice.
08300
08400 The reason seems to be that these games differ in the
08500 intellectual mechanisms required for successful play. When the game
08600 requires intellectual mechanisms that we understand how to program,
08700 then the programs do well. If, on the other hand, the mechanisms
08800 used in high quality human play are not well understood, then it is
08900 hard to make a good program. Experiments in programming computers to
09000 play games help us identify and verify our understanding of
09100 intellectual mechanisms.
09200
09300 Thus, in kalah there is no long range strategy - there are
09400 only tactical gimmicks for winning a few points. Moreover, there are
09500 only a few moves in any position and it is hard to eliminate some of
09600 them as without interest. Once these mechanisms are programmed, the
09700 machine's speed wins. Checkers requires more of the player. In the
09800 first place, there is some strategy - if I can hold two of the
09900 opponent's men with one of mine and we both king all our remaining
10000 men I will win by outnumbering him on the rest of the board. I can
10100 win a game using this principle without being able to look ahead on
10200 the move tree to the point where my victory is explicit.
10300 Nevertheless, there are relatively few moves and a computer can
10400 overcome our limited ability to program strategy by looking ahead
10500 very far. Chess has many more moves in a given position than checkers
10600 so the computer cannot look very far ahead unless it can severely
10700 restrict the class of moves worth looking at. Humans do this by
10800 making plans and by considering only moves that further plans or
10900 refute them. None of the present chess programs generate plans
11000 worthy of the name. The game of go introduces still another major
11100 complication. The number of moves in a position is usually more
11200 than 100 so that straight look-ahead is of no use whatsoever. Human
11300 play depends on our ability to analyze the position into subpositions
11400 and to examine first the move trees of the subpositions and then
11500 their interaction. This analysis is rather subtle, and we can't
11600 program it well yet.
11700
11800 Another advantage that human players of chess and go have
11900 over present computer programs lies in the way they consider the tree
12000 of legal moves. Almost no present programs associate corresponding
12100 moves on different branches of the tree even though moving a given
12200 piece to a given square may have similar properties in distinct
12300 positions. In the case of good moves, this is not hard to do. A
12400 move that has been shown to be good in one position may be tried
12500 first in other positions. However, consider the move P-R3 in the
12600 initial position in chess. Under the name of "killer heuristic", this
12700 mechanism has been used to good effect in a chess mating program.
12800 Present programs find that it has no special virtues just as humans
12900 do and don't make it. However, after the program has decided to
13000 investigate, say, P-K4 and an opponent's reply, the program repeats
13100 its investigation of P-R3. In fact, the program may consider P-R3
13200 fifty times in deciding on its first move. The obvious solution is
13300 to put on a list of bad moves and not consider it further, but events
13400 may occur that will make it relevant, so it has to be arranged that
13500 certain events will trigger renewed consideration of P-R3. No
13600 program along these lines has yet been written, and we still observe
13700 that chess programs waste most of their time considering moves that
13800 any human chess player can tell you are not worthy of consideration.
13900
14000 One may draw another conclusion relevant to natural
14100 intelligence. This is that the solution methods used for some
14200 important classes of problems are objective; they depend neither on
14300 heredity nor environment, because since they are required for
14400 success, they must either evolve with the species or be learned by
14500 the individual. This point is illustrated by the αβ-heuristic used in
14600 game playing. When the possibility of computer game playing was first
14700 discussed (Shannon 1950), it was proposed to use a minimax procedure
14800 for selecting the right move to make: the move tree is examined to a
14900 certain depth terminating in a move of the machine's. Then values
15000 are assigned to all terminal positions. Then values are assigned to
15100 the positions immediately preceding the terminal positions by
15200 assigning to such a position the best value of the terminal positions
15300 to which moves exist from it. Values are assigned to the preceding
15400 positions by taking the worst value of a position immediately
15500 accessible since the move at that point belongs to the machine's
15600 opponent. This procedure is continued until a value is assigned to
15700 the initial position. The first game playing programs used this
15800 procedure or one equivalent to it in amount of work. However, in
15900 1956 I noticed that if the program has found a move that gains an
16000 amount α and in the course of analyzing another move it has found an
16100 opponents reply that holds the gain to less than α, then there is no
16200 point in examining other moves of the opponent. The αβ-heuristic is
16300 a generalization of this observation which was put into its final
16400 form by Edwards and Hart (1960), and which was discovered
16500 independently by Brudno (1963). A little thought convinces one of
16600 two points: First, every human uses αβ, including the programmers
16700 whose programs don't. Second, good game playing requires its use or
16800 something equivalent, whether the player be man, machine, or Martian.
16900
17000 Some of the aspects of game-playing have more general
17100 significance in artificial intelligence and in the study of
17200 intelligence in general. For example, the analysis of problems into
17300 subproblems that can be studied separately at first with the
17400 interactions having to be combined later is a general phenomenon. The
17500 depth first and breadth first methods of tree searching and the
17600 alpha-beta method of cutting off tree search are relevant to other
17700 problems than games. (alpha-beta is relevant to games against nature
17800 arising from the search for worst case solutions to problems). For
17900 an extended discussion of tree search see (Nilsson 1971) and for a
18000 discussion of particular problems that have been worked on, see
18100 (Slagle 1971).
18200
18300 Another moral that arises from the experiments with different
18400 games is the invalidity of identifying a specific intellectual
18500 mechanism with intelligence. The prize example of this would be the
18600 identification of intelligence in rats with maze learning ability. We
18700 can easily make a maze running computer program better than any rat
18800 or human, but we would not be inclined to regard it as especially
18900 intelligent, because we would not have to put into it many of the
19000 known intellectual mechanisms.
19100
19200 So far it has usually turned out that once an intellectual
19300 mechanism is discovered it seems obvious even though people wrote
19400 programs for years without including it. This leads us to hope that
19500 the source of difficulty in understanding intelligent behavior is bad
19600 methodology, and once we know what to look for, the mechanisms at
19700 least of conscious thought will not be difficult to observe. At
19800 present, it seems that most theories of cognitive processes are
19900 clearly wrong in that either no mechanism is described or, if it were
20000 programmed, it would not produce the results ascribed to it.
20100
20200 We can also derive some instructive lessons from the work in
20300 computer theorem proving. Some of the mathematicians who did the
20400 first work in computer theorem proving were very optimistic about
20500 what theorems the programs they proposed would be able to prove. See
20600 (Davis and Putnam 1958). In fact, the performance of the first
20700 programs was extremely disappointing, and after ten years of effort
20800 programs are just on the verge of being useful in a few very formal
20900 and unintuitive branches of mathematics. Even the present programs do
21000 not use examples well to guide the formal reasoning and do not have
21100 mechanisms that allow the methods of reasoning to depend properly on
21200 the nature of what is to be proved.
21300
21400 PATTERN RECOGNITION, PATTERN DESCRIPTION AND PATTERN MATCHING
21500
21600 Another problem that was tackled soon after the availability
21700 of computers was pattern recognition. As usually posed in the 1950s,
21800 the problem was to choose which of a number of classes a stimulus
21900 belongs to. Classifying dot arrays as representing letters of the
22000 alphabet was much studied. This problem lent itself well to
22100 mathematical analysis, because one could imagine that the image scene
22200 represented and ideal signal to which noise had been added. Other
22300 approaches represented the signals as points in a many dimensional
22400 space and the signals in a class as a point set in this space, and it
22500 was hoped that the different classes could be separated by passing
22600 hyperplanes between them or in some other mathematically simple way.
22700 Algorithms for learning how to classify signals were also susceptible
22800 to mathematical treatment. Initially, some people were very
22900 optimistic about these methods and even proposed applying them to
23000 situations where heuristic programming was being applied, but the
23100 long run success was quite limited. In the first place, the
23200 mathematical problems were applicable only to simple problems, but
23300 even more seriously it has turned out that classification is the
23400 wrong problem; a robot or a person that has to look at parts on a
23500 table and assemble a given structure out of suitable parts cannot
23600 succeed if its pattern handling ability is limited to classifying the
23700 scene as a whole into a finite number of categories. Instead,
23800 programs have had to be written that describe the scene as a list of
23900 objects and describe each object as made up of parts. Objects
24000 bounded by flat faces can be described, at least in principle, by
24100 present programs, but we still have no satisfactory ideas on how to
24200 describe irregular objects such as plants or heads with hair on them.
24300 Clearly, it is unreasonable to describe each hair, but what kind of
24400 approximate description should be used?
24500
24600
24700 It seems to me that the psychological work of Jerome Bruner
24800 on concepts described in his book "A Study of Thinking" is similar in
24900 spirit to that of the pattern recognizers and is subject to some of
25000 the same criticisms. Bruner considered a concept to be a Boolean
25100 combination of elementary concepts and studied experimentally the
25200 strategies used by human subjects in inducing a concept when
25300 presented with a sequence of stimuli and told whether they were
25400 examples of the unknown concept. As a counter-example to this concept
25500 of concept consider the following description of a class of plane
25600 figures: it consists of four line segments, one vertical and three
25700 horizontal, the left ends of the horizontals coinciding with the top,
25800 middle, and bottom of the vertical. the top and bottom horizontals
25900 are the same length but shorter than the vertical, and the middle
26000 horizontal is no longer than the other two. We have here a
26100 description of the letter E. However, the concept is not
26200 representable as a Boolean combination of properties of the figure as
26300 a whole. It is, however, quite naturally representable by a
26400 predicate calculus formula such as
26500
26600 H(x) ∧ H(y) ∧ H(z) ∧ V(w) ∧ left(x)=top(w) ∧
26700 left(y)=middle(w) ∧ left(z)=bottom(w) ∧ length(y) ≤ length(x) =
26800 length(z) < length(w)
26900
27000 where it must be stated in some added formalism that an E consists of
27100 a quadruplet (x,y,z,w) where x, y, z, and w satisfy the above
27200 formula.
27300
27400 More recently, ideas of pattern matching originating in the
27500 formal linguistics have been applied to pattern description. It is
27600 too early to say how successful this will be, but some extension of
27700 the ideas to take into account partial occlusion of an object by
27800 others seems to be required.
27900
28000
28100 LEARNING
28200
28300 Considerable effort in AI has been put into making programs
28400 that learn from their experience. Samuel's checker program contained
28500 two forms of learning. The first was rote learning in which the
28600 values assigned to positions by look-ahead were stored so that if the
28700 program encountered the position anywhere on the move tree in a
28800 subsequent game, the value could be used without further look-ahead.
28900 The second form of learning is of the best values of the parameters
29000 that determine how positions are evaluated. This is done by going
29100 through book games and adjusting the parameters so that they propose
29200 the moves that the experts consider good as often as possible.
29300 Similar learning schemes have been used by others.
29400
29500 The big problem in learning is generality. Thus Samuel's
29600 checker program is a sucker for the following stratagem provided you
29700 can make the situation come up in the game. Suppose there are white
29800 men at squares 20 and 28 and a black king at 19. Suppose further that
29900 black has one more man than white on the rest of the board. If the
30000 configuration on squares 19, 20, and 28 is allowed to persist while
30100 the remaining men on each side become kings, then black will win
30200 because his king holds two white men leaving him with a majority on
30300 the rest of the board which will enable him to exchange white down
30400 and win. Therefore, white has to use his forces to drive away the
30500 king on 19 and release its men to become kings. Samuel's program
30600 may happen to do this, because it wants to advance its men, but it
30700 will be just as happy to advance other men if this is easier.
30800 Look-ahead doesn't help because the ultimate disaster is too far
30900 away.
31000
31100 Not only doesn't the program not know about this stratagem,
31200 but it can't even be told, because no setting of its parameters
31300 corresponds to this knowledge. Of course, the program can easily be
31400 modified to use this stratagem, but this is like educating humans
31500 through brain surgery.
31600
31700 This suggests trying to divide the problem of computer
31800 learning into two parts. First comes the problem of representing
31900 information in the computer in such a way that small changes in the
32000 behavior of the program correspond to small changes in the
32100 representation. It is also desirable that these changes in
32200 representation be expressable as additions without requiring a
32300 knowledge of the complete structure of the program, for certainly one
32400 human can teach another checker stratagems without knowing the
32500 structure of his brain. Only after we had a teachable program would
32600 we try to make one that could learn for itself.
32700
32800 These ideas have led to a project called the Advice Taker of
32900 making a program that could be told anything that can be expressed in
33000 ordinary language. This leads to the epistemological problems
33100 discussed in the next section.
33200
33300
33400 EPISTEMOLOGY
33500
33600 The preceding discussion has concerned programs that use
33700 particular intellectual mechanisms to solve particular classes of
33800 problems. What must we do to give a machine the same flexible
33900 ability to attack a wide class of problems that a human has. There
34000 is not general agreement about this among AI researchers, so I shall
34100 give my own views and their consequences for the study of natural
34200 intelligence.
34300
34400 Our first step is to try to break the problem of AI into two
34500 parts, the epistemological part and the heuristic part. The
34600 epistemological part is the problem of how to represent in the memory
34700 of the computer what the program "believes" about the present state
34800 of the world, its "laws of motion", and the facts that determine the
34900 consequences of the actions it may take. The heuristic part of the AI
35000 problem starts from such a representation and is concerned with the
35100 procedures that decide what to do to achieve a goal given the facts.
35200 Most of the work in artificial intelligence including the work
35300 described in the previous parts of this paper have concerned the
35400 heuristic problem.
35500
35600 The epistemological problem for AI is presently in a state
35700 where it gets more complicated the more we look at it. Here are some
35800 of the considerations:
35900
36000
36100 1. Laplace proposed to represent the state of the world by
36200 specifying the positions and velocities of all the particles and
36300 offered to predict the future given the force law. Modern physics is
36400 similar; given the wave function, it offers to predict the
36500 probabilities of the various future states. However, even if Laplace
36600 were correct and we knew the force law we could not use his
36700 representation to describe, say, the situation in a room full of
36800 people. In the first place, our information about the situation is
36900 not given in terms of the positions and velocities of particles, and,
37000 even if it were, we couldn't do the calculations to predict the
37100 future. Therefore, we must ask ourselves what information is actually
37200 available about situations and what laws allow us to determine the
37300 effects of our actions. This information must include descriptions
37400 and physical locations of objects including animate objects, beliefs
37500 and purposes of humans and other animate objects, and qualitative
37600 forms of the laws of motion such as the fact that the chalk will
37700 break if I throw it on the floor. Included also is the information
37800 that permits me to draw conclusions about partially occluded objects
37900 such as the facts that enable me to infer something about the back of
38000 a person's head seeing the front of it.
38100
38200 2. The main tool that has been used so far for expressing
38300 this kind of information formally has been mathematical logic
38400 especially predicate calculus. A number of programs such as (Green
38500 1969) have been used for solving various problems expressing the
38600 particular data in the form of predicate calculus sentences.
38700 (McCarthy and Hayes 1969) contains a discussion of some of the
38800 problems that arise and proposes methods for their solution.
38900
39000 3. The use of mathematical logic for expressing ordinary
39100 reasoning encounters a number of difficulties one of which is called
39200 the qualification problem. Namely, all logical systems constructed
39300 so far have the property that if a sentence p follows from a set A of
39400 sentences, then p also follows from any set B of sentences such that
39500 A ⊂ B. This does not seem to be so in ordinary language. Suppose a
39600 person wants to go to Fresno from San Francisco and after thinking
39700 about the circumstances he has described, we conclude that he ought
39800 to take a certain flight. Then he tells us that he is afraid to
39900 travel by airplane. Thus, the conclusion that he ought to go by
40000 airplane follows from the facts originally stated but not from these
40100 facts supplemented by an additional fact. In fact, human processes
40200 of reasoning allow jumping to certain conclusions provided these
40300 conclusions do not contradict known facts. Efforts are being made to
40400 make computer programs reason in such ways. In particular, Carl
40500 Hewitt's (1971) programming language PLANNER makes it convenient to
40600 write programs that will assume something unless they can prove the
40700 contrary when this is desirable.
40800
40900
41000 CONCLUSIONS
41100
41200 Experience with trying to make computer programs that behave
41300 intelligently has led me to the following conclusions which may be of
41400 interest to students of natural intelligence.
41500
41600 1. Many intellectual mechanisms are required by the problems
41700 they are used to solve. Whether the mechanism is reached by genetic
41800 selection, learning in the environment, or by conscious reasoning, it
41900 will be the same.
42000
42100 2. Before an organism or a machine can learn a fact or a
42200 behavior it must have a mechanism whereby this fact or behavior can
42300 be represented in its memory structure.
42400
42500 3. Likewise, it is worthwhile to distinguish what an organism
42600 knows from what it can say. I liked Dr. Premack's careful
42700 investigation to determine that the chimpanzee knows about something
42800 before trying to teach it to communicate about it.
42900
43000 4. To the extent that we can identify the mind as an
43100 information structure we can study it independently of the hardware.
43200 This is fortunate since anatomy and physiology have not been able to
43300 give the student of intelligence.
43400
43500 5. The question of whether an idea for a mechanism of solving
43600 a class of problems is well defined or will work is often not
43700 obvious. Attempting to write a computer program to embody the
43800 mechanism will often decide the matter.
43900
44000 6. I don't know to what extent behaviorism is still a live
44100 issue of view in psychology, but it has never been a live possibility
44200 in artificial intelligence. Computers have no interesting pure
44300 stimulus-response properties above the level of, "if you press the
44400 button labelled ' START' on the printer, a yellow light goes on
44500 labelled 'PRINTER ON'". All reasoning about intelligent behavior
44600 involves states of the machine that are not directly observable.
44700
44800 7. Many of the most important intellectual mechanisms are
44900 observable by analyzing how people perform complex tasks. The
45000 easiest technique is to observe oneself. Many errors will be caught
45100 by programming a computer to carry out the task in the proposed
45200 manner.
45300
45400
45500 REFERENCES
45600 Brudno
45700 Bruner, J.S., Goodnow, J.J. and Austin, C.A. (1956) A Study of
45800 Thinking. New York: John Wiley.
45900 Davis, M. and Putnam, H. (1960) A Computing Procedure for
46000 Quantification Theory. J. ACM, 7, 201.
46100 Gelernter, H.(1959) Realization of a Geometry Theorem Proving Machine.
46200 Proc. ICIP. Paris: UNESCO House.
46300 Greenblatt,R.D., Eastlake, D.E. and Crocker, S.D. (1967) The Greenblatt
46400 Chess Program. Proc. FJCC. Washington: Thompson Book Co.
46500 Hart, T.P. and Edwards, D.dJ. (1961) The Tree Prune (TP) Algorithm.
46600 M.I.T. AI Project: Memo No. 30.
46700 Hewitt, Carl (1967) Description and Theoretical Analysis (using Schemas)
46800 of Planner: A Language for Proving Theorems and Manipulating
46900 Models in a Robot. Ph.D. Thesis, Massachusetts Institute of
47000 Technology. McCarthy, J. (1959) Programs with Common Sense.
47100 McCarthy, J. (1959) Programs with Common Sense. Mehanisation of Thought
47200 Processes, Volume I. London:HMSO.
47300 McCarthy, J. and Hayes, P. (1969) Philosophical Problems from the
47400 Standpoint of Artificial Intelligence. Machine Intelligence
47500 4, pp. 463-502 (eds. Meltzer, B. and Michie, D.). Edinburgh:
47600 Edinburgh University Press.
47700 Newell
47800 Nilsson, N.J. (1971) Problem-Solving Methods in Artificial
47900 Intelligence. New York: McGraw-Hill.
48000 Russell, S.R. (1963) Improvements in LISP Debugging. Stanford Artificial
48100 Intelligence Project: Memo No. 10.
48200 Russell, R. (1964) Kalah - The Game and the Program. Stanford Artificial
48300 Intelligence Project: Memo No. 22.
48400 Russell, R. (1964) Improvements to the Kalah Program. Stanford Artificial
48500 Intelligence Project: Memo No. 23.
48600 Russell, R. (19 ) Personal Communication.
48700 Ryder, J.L. (1971) Heuristic Analysis of Large Trees as Generated in the
48800 Game of Go. Ph.D. Thesis, Stanford University.
48900 Samuel, A.L.
49000 Shannon, C. (1950) Programming a Digital Computer for Playing Chess.
49100 Philosophy Magazine. 41, 356-375.
49200 Simon
49300 Slagle, J.R., ed. (1971) Artificial Intelligence: The Heuristic Programming
49400 Approach. New York: McGraw-Hill.
49500 Turing, A.M. (1950) Computing Machinery and Intelligence. Mind. 59, 433-
49600 460.